*Constructs instruments
*Also runs estimation of upper tier EoS as an inputs
*Also computes price indices as an inputs
*Also constructs independent and dependent variables

*bilateral trade data
use oecd_gravity_data_intermediates.dta, clear
sort year originId sectorId

save temp.dta, replace

*TE estimates from GYY
use theta_estimates_gyy.dta, clear
rename theta_gyy_pml theta
keep sectorId theta se_gyy

*Replace non-manufacturing theta in the baseline with the median elasticity
replace theta=. if sectorId<3
egen theta_non = median(theta)
replace theta=theta_non if sectorId<3
drop theta_non

merge 1:m sectorId using temp.dta
drop _merge


*Bringing in Nunn Data
merge m:1 countryISO using nunn_aggregate.dta
drop if _merge==2
drop _merge

sort originId destId sectorId year
*drop non-traded sectors
keep if sectorId<18
save temp.dta, replace


**************************************************************************
*Constructing Sectoral Price Indices
*************************************************************

*Taking logs
gen logX=log(X)
drop if logX==.
egen E = sum(X), by(destId sectorId year)
gen logE = log(E)

*Also need consumption expenditures for the estimation with intermediate goods
egen E_C = sum(X_C), by(destId sectorId year)
gen logE_C = log(E_C)

*******************
*Baseline Values of Theta
*******************
*Average trade shares, adjusted by the trade elasticity for each pair-sector-year
gen ts_theta = (logX-logE)/theta
*Price index for each destination-sector-year
egen logprice_base=mean(ts_theta), by(year destId sectorId)


 ****************************
 *Estimating the EoS
 ****************************
 
 keep if originId==destId

keep year sectorId destId logprice* logE pop logE_C
sort year destId sectorId
duplicates drop
 
 gen logpop = log(pop)
 egen clusterId=group(destId)
 
 
 *Note: we only use the 2010 data to estimate the elasticity of substitution
 
 ************************************
 *Tables B.1 and B.2
 ************************************
 
*Baseline
local dvars logE
local indvars logprice_base
local instruments c.logpop#sectorId

*OLS
 reghdfe `dvars' `indvars' if sectorId>2 & year==2010 & destId<62, cluster(clusterId) ///
a(destId sectorId)
outreg2 using table2.xml, replace excel noaster dec(2)   nor2 addstat(Within R-squared, e(r2_within)) nopa

*IV
 ivreghdfe `dvars' (`indvars'=`instruments') if sectorId>2 & year==2010, cluster(clusterId) ///
a(sectorId destId)
outreg2  using table2.xml, append excel nor2 dec(2)  noaster nopa

*First Stage Regression
reghdfe `indvars' `instruments' if sectorId>2 & year==2010, cluster(clusterId) ///
a(destId sectorId)
outreg2 using table2_fs.xml, replace excel noaster dec(2)  nor2 addstat(Within R-squared, e(r2_within))

*Now doing consumption expenditures rather than total expenditures, for the intermediate goods instrument
local dvars logE_C
local indvars logprice_base
local instruments c.logpop#sectorId

*IV
 ivreghdfe `dvars' (`indvars'=`instruments') if sectorId>2 & year==2010, cluster(clusterId) ///
a(sectorId destId)

*First Stage Regression
reghdfe `indvars' `instruments' if sectorId>2 & year==2010, cluster(clusterId) ///
a(destId sectorId)

**************************************************************************
*Building the CES instruments

*Sigmas (from estimation results)
local sigma_base=0.87
local sigma_intermediate=1.78

*Instruments (log L +log beta, where beta is the demand residual)

foreach x in base {
gen comp1 = exp(logE)/(exp(logprice_`x'))^(1-`sigma_`x'') if sectorId>2
egen comp=sum(comp1),by(destId  year)
gen beta_`x' = comp1/comp
gen instrument_`x' = log(beta_`x')+logpop
drop comp1 comp	
	}
	
*Now with consumption expenditures for the intermediate goods instrument
gen comp1 = exp(logE_C)/(exp(logprice_base))^(1-`sigma_intermediate') if sectorId>2
egen comp=sum(comp1),by(destId  year)
gen beta_intermediate = comp1/comp
gen instrument_intermediate = log(beta_intermediate)+logpop
drop comp1 comp

************************************************************************************************************
*Now merging back into the main dataset, associating them with origins rather than destinations

keep sectorId year destId logprice_base instrument*
rename destId originId

merge 1:m originId sectorId year using temp.dta, ///
	nogen

save int_reg_data.dta, replace


*Next, need to reshape prices to construct price indices
keep if originId==destId
keep originId year sectorId logprice_base
reshape wide logprice_base, i(originId year) j(sectorId)

merge 1:m originId year using int_reg_data.dta, ///
	nogen

drop logprice_base
save int_reg_data.dta, replace

**************************************************************************************************
*Constructing Independent and Dependent Variables under various assumptions
*************************************************************************************************
*In the presence of intermediates and capital, need to adjust both independent and dependent variables for the factor prices
*Need to construct an intermediate goods price index for each sector-originId
*First need to construct proxies for non-traded goods prices using traded goods prices, wages, rental rates and factor shares

*First making matrices and vectors from the factor share dataset
use average_factor_shares.dta, clear
sort sectorId
*Input-output matrix (32x32).  Note row is using, column is selling
mkmat intshare_*, matrix(B)
*Submatrix of non-traded inputs to non-traded sectors
matrix B_NN = B[18..32,18..32]
*Submatrix of traded inputs to non-traded sectors
matrix B_NT = B[18..32,1..17]
*Submatrix of non-traded inputs to traded sectors
matrix B_TN = B[1..17,18..32]
*Submatrix of traded inputs to traded sectors
matrix B_TT = B[1..17,1..17]
*Vector of labor shares of gross output
gen lab_share_go = wa_va_share*w_avg_lab_share
mkmat lab_share_go , matrix(AB)
matrix AB=AB[18..32,1]
*Vector of capital shares of gross output
gen cap_share_go = wa_va_share*(1-w_avg_lab_share)
mkmat cap_share_go, matrix(BA)
matrix BA = BA[18..32,1]

*Now construct the input price index for each country-sector

clear
set obs 1
gen zzz=.
save price_index, replace

forvalues x=1/61	{
use int_reg_data.dta, clear
keep if originId==`x' & destId==1 & year==2010
sort sectorId
keep originId sectorId nominal_wage_wbill nominal_rental_rbill logprice_base*
*Vector of log prices in traded goods sectors
mkmat logprice_base*, matrix(TP)
matrix TP = TP[1..1,1..17]

*Scalar of the nominal wage
scalar logwage = log(nominal_wage_wbill[1])
*Scalar of nominal rental rate
scalar logrental = log(nominal_rental_rbill[1])

*Now we are ready to make the matrix of non-traded log prices
matrix NTP = inv(I(15)-B_NN)*(logwage*AB+logrental*BA+B_NT*TP')
*Nontraded goods price index for traded sectors
matrix NTPI=B_TN*NTP
*Traded goods price index for traded sectors
matrix TPI = B_TT*TP'
*Total price index for traded goods sectors
matrix PI= NTPI+TPI

*Saving results
clear
set obs 1
svmat PI
gen originId=`x'
gen sectorId=_n
rename PI1 int_price_index
append using price_index.dta
save price_index.dta, replace
}
drop zzz

*Now merging the price index data back into the estimation dataset

merge 1:m originId sectorId using int_reg_data.dta
keep if _merge==3
drop _merge

save regression_data_int.dta, replace

***********************************************************************************************
*Generating independent and dependent variables under various assumptions
****

*Dependent variable with no intermediates
gen logX_theta=log(X)/theta

*Dependent variable in the presence of intermediates
*Generate adjusted dependent variable using nominal wages and interest rates and intermediate goods price index
gen logX_adj = log(X)/theta+int_price_index +wa_va_share*w_avg_lab_share*log(nominal_wage_wbill)+wa_va_share*w_avg_cap_share*log(nominal_rental_rbill)
label var logX_adj "logZ, the dependent variable in the presence of intermediates and capital"

*Now construct independent variables for all cases
*Baseline: no intermediates
egen X_s = sum(X), by(year originId sectorId)
gen logX_s = log(X_s)
gen logL=log(X_s/nominal_wage_go)
label var logL "Log L_ik under the baseline assumption of labor as the only factor of production"

*Independent variable with capital and intermediates, when SE are in gross output
*This is our measure of the "gross input"
gen logZ_adj = log(X_s) - int_price_index -wa_va_share*w_avg_lab_share*log(nominal_wage_wbill)-wa_va_share*w_avg_cap_share*log(nominal_rental_rbill)
label var logZ_adj "Gross input measure, under the presence of intermediates and capital"

*Now generate adjusted independent variable when SE are in labor only
*This is the estimated labor input when taking into account the existence of intermediates and capital
gen logL_adj =log(X_s*w_avg_lab_share*wa_va_share/nominal_wage_wbill)
label var logL_adj "Log L_ik under the assumption of capital and intermediates"

*Cleaning up, labeling variables
gen logX=log(X)
keep  originId destId sectorId year countryISO instrument* logX* logL* logZ* Qc ln_credit_banks real_wage_pwt theta* pop se_gyy

label var instrument_base "Demand shock instrument, baseline assumptions"
label var instrument_intermediate "Demand shock instrument, intermediates and capital"
label var logX_theta "Log Sales from Origin to Destination in Sector-year, divided by the trade elasticity"


sort originId destId sectorId year
order originId destId sectorId year
 
save regression_data.dta, replace






 
 
 


